Home Credit Default Risk Competition

Overview:

Problem statement:

Scope:

Import Needed libraries and read the data

Automated Exploratory Data Analysis

Handle missing data

Handle Outliers

Cluster missing data for later profiling

Data imputation & prep

Data Imputation

Data upsampling

Feature importance

Correlation

Dimensionality Reduction

Data Modeling

Base Model

Random Forest

The observed f1 is the same as the unoptimized model so we need to find a diff way to optimize the tree or change the search space or try a different model/features

Random Forest using Baysien optimization

Thank you